Out-of-Order Memory Accesses Using a Load Wait Buffer
نویسندگان
چکیده
Many dynamic scheduling techniques take advantage of out-of-order instruction execution to hide memory access latency. However, as the disparity between processor and memory speeds increases, delays in the load-store queue become more of a bottleneck. One way to mitigate these delays is to allow loads and stores to execute and retire from the load-store queue (LSQ) out-oforder. Unfortunately, when the LSQ fills with pending loads, other loads and stores are prevented from entering the buffer to be retired. In addition to out-of-order execution of loads and stores, we propose temporary removal of long-latency, pending loads to a separate load wait buffer (LWB), similar to the waiting instruction buffer (WIB) proposed by Lebeck, et. al. [1]. Simulation results show successive increases in benchmark IPC with out-of-order loads, out-of-order loads and stores, and outof-order loads and stores with a LWB. The design with the LWB shows up to 303% speedup in IPC.
منابع مشابه
Autonomous Instruction Memory Equipped with Dynamic Branch Handling Capability
portable information appliances, the extraordinary power consumption ratio of memory accesses promotes importance of efficient memory system design to an ultimate. We address the following issues: how to minimize memory bandwidth requirement for instruction accesses, and how to minimize memory access delay, again for instruction accesses. Then we propose to move dynamic branch handler (e.g., br...
متن کاملAccurate analysis of memory latencies for WCET estimation
These last years, many researchers have proposed solutions to estimate the Worst-Case Execution Time of a critical application when it is run on modern hardware. Several schemes commonly implemented to improve performance have been considered so far in the context of static WCET analysis: pipelines, instruction caches, dynamic branch predictors, execution cores supporting out-of-order execution...
متن کاملData Prefetching by Exploiting Global and Local Access Patterns
This paper proposes a new hardware prefetcher that extends the idea of the Global History Buffer (GHB) originally proposed in [1]. We augment the GHB with several Local History Buffers (LHBs), which keep the memory access information for selective program counters. These buffers can then be queried on cache accesses to predict future memory accesses and enable data prefetching using novel detec...
متن کاملOptimized On-Chip-Pipelining for Memory-Intensive Computations on Multi-Core Processors with Explicit Memory Hierarchy
Limited bandwidth to off-chip main memory tends to be a performance bottleneck in chip multiprocessors, and this will become even more problematic with an increasing number of cores. Especially for streaming computations where the ratio between computational work and memory transfer is low, transforming the program into more memory-efficient code is an important program optimization. On-chip pi...
متن کاملFailure-Oblivious Computing and Boundless Memory Blocks
Memory errors are a common cause of incorrect software execution and security vulnerabilities. We have developed two new techniques that help software continue to execute successfully through memory errors: failure-oblivious computing and boundless memory blocks. The foundation of both techniques is a compiler that generates code that checks accesses via pointers to detect out of bounds accesse...
متن کامل